Skip to content

Da 1766 ironing the testing pipeline for spider2.0 testing of cb mcp server#1

Open
Arjunnr-cb wants to merge 9 commits into
mainfrom
DA-1766-ironing-the-testing-pipeline-for-spider2.0-testing-of-cb-mcp-server
Open

Da 1766 ironing the testing pipeline for spider2.0 testing of cb mcp server#1
Arjunnr-cb wants to merge 9 commits into
mainfrom
DA-1766-ironing-the-testing-pipeline-for-spider2.0-testing-of-cb-mcp-server

Conversation

@Arjunnr-cb
Copy link
Copy Markdown
Collaborator

This pull request introduces a pipeline for end-to-end testing and evaluation of the Couchbase MCP server with SQL++ query generation and analysis. It adds a Makefile-based workflow, a structured configuration system, and robust automation for setup, execution, and logging. The changes focus on making the pipeline easy to configure, run, and debug, while supporting multiple datasets and environments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an end-to-end, Makefile-driven pipeline to generate SQL++ via the Couchbase MCP server, postprocess the generated queries, and evaluate/analyze results against a Couchbase cluster, with configuration flowing from config.json → .env.

Changes:

  • Introduces make setup/run/quicktest workflow and a scripts/setup.py generator to produce .env from config.json.
  • Updates run_mcp.sh to load .env, start iQ-FastAPI automatically, run MCP generation, postprocess, and evaluate/analyze into timestamped runs/<IST_timestamp>[_tag]/.
  • Adds a Snowflake question set file and supporting docs/ignore rules.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/run.py Makes MCP server path/venv/log path configurable via environment variables and remaps MCP_CB_* to CB_*.
test/questions_snowflake.jsonl Adds Snowflake question set (currently JSONL format).
scripts/setup.py Adds generator to create .env from config.json values.
run_mcp.sh Adds .env loading, iQ-FastAPI lifecycle management, run naming, stale output cleanup, postprocess step, and updated evaluation invocation.
Makefile Adds setup, run, and quicktest targets driven by .env variables.
docs/testing-pipeline-for-semantic-catalog.md Documents setup/configuration and pipeline flow.
config_example.json Adds structured config template for required paths/credentials and pipeline defaults.
.gitignore Ignores generated artifacts (runs/, test/output/) and config.json.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread run_mcp.sh
exit 1
fi

# Python for evaluation scripts (needs pandas, couchbase, tqdm — use iQ-FastAPI venv)
Comment thread run_mcp.sh
Comment on lines 200 to 204
echo " (limit: $LIMIT questions)"
fi
"$PYTHON_BIN" "${TEST_DIR}/run.py" --questions_file "$QUESTIONS_FILE" $LIMIT_ARG
MCP_SERVER_LOG_FILE="${LOG_DIR}/mcp_server.log" \
"${MCP_SERVER_VENV_PATH}/bin/python" "${TEST_DIR}/run.py" --questions_file "$QUESTIONS_FILE" $LIMIT_ARG
echo ""
Comment thread run_mcp.sh
if [[ "$LIMIT" -gt 0 ]]; then
LIMIT_ARG="--limit $LIMIT"
echo " (limit: $LIMIT questions)"
fi
Comment thread run_mcp.sh
Comment on lines +228 to 233
"$EVAL_PYTHON" "${EVAL_DIR}/evaluate_sqlpp_catalog.py" \
--result_dir "$SUBMISSION_DIR" \
--gold_dir "$GOLD_DIR" \
--max_workers "$MAX_WORKERS" \
--timeout "$TIMEOUT" \
2>&1 | tee "${LOG_DIR}/evaluate.log"
--timeout "$TIMEOUT"

Comment thread scripts/setup.py
Comment on lines +38 to +43
for var in group["variables"]:
key = var["key"]
value = var.get("value", "")
lines.append(f"{key}={value}")
if var.get("required") and not value:
missing_required.append(key)
Comment thread test/run.py
def _resolve_server_dir() -> Path:
env_path = os.environ.get("MCP_SERVER_PATH", "").strip()
if env_path:
return Path(env_path)
Comment on lines +90 to +95
| `PIPELINE_DATASET` | File |
|--------------------|------|
| `sqlite` | `test/questions_sqlite.json` |
| `snowflake` | `test/questions_snowflake.json` |
| `bird` | `test/questions_bird.json` |

├── 1. Start iQ-FastAPI (query generation backend)
├── 2. Run MCP client → generate SQL++ for each question
├── 3. Copy .sqlpp files to submission/
Comment on lines +1 to +16
{"instance_id": "sf_bq029", "db": "PATENTS", "question": "Get the average number of inventors per patent and the total count of patent publications in Canada (CA) for each 5-year period from 1960 to 2020, based on publication dates. Only include patents that have at least one inventor listed, and group results by 5-year intervals (1960-1964, 1965-1969, etc.).", "external_knowledge": null}
{"instance_id": "sf_bq026", "db": "PATENTS", "question": "For the assignee who has been the most active in the patent category 'A61', I'd like to know the five patent jurisdictions code where they filed the most patents during their busiest year, separated by commas.", "external_knowledge": null}
{"instance_id": "sf_bq091", "db": "PATENTS", "question": "In which year did the assignee with the most applications in the patent category 'A61' file the most?", "external_knowledge": null}
{"instance_id": "sf_bq099", "db": "PATENTS", "question": "For patent class A01B3, I want to analyze the information of the top 3 assignees based on the total number of applications. Please provide the following five pieces of information: the name of this assignee, total number of applications, the year with the most applications, the number of applications in that year, and the country code with the most applications during that year.", "external_knowledge": null}
{"instance_id": "sf_bq033", "db": "PATENTS", "question": "How many U.S. publications related to IoT (where the abstract includes the phrase 'internet of things') were filed each month from 2008 to 2022, including months with no filings?", "external_knowledge": null}
{"instance_id": "sf_bq209", "db": "PATENTS", "question": "Can you calculate the number of utility patents that were granted in 2010 and have exactly one forward citation within a 10-year window following their application/filing date? For this analysis, forward citations should be counted as distinct citing application numbers that cited the patent within 10 years after the patent's own filing date.", "external_knowledge": null}
{"instance_id": "sf_bq027", "db": "PATENTS", "question": "For patents granted between 2010 and 2018, provide the publication number of each patent and the number of backward citations it has received in the SEA category.", "external_knowledge": null}
{"instance_id": "sf_bq210", "db": "PATENTS", "question": "How many US B2 patents granted between 2008 and 2018 contain claims that do not include the word 'claim'?", "external_knowledge": null}
{"instance_id": "sf_bq211", "db": "PATENTS", "question": "Among patents granted between 2010 and 2023 in CN, how many of them belong to families that have a total of over one distinct applications?", "external_knowledge": null}
{"instance_id": "sf_bq213", "db": "PATENTS", "question": "What is the most common 4-digit IPC code among US B2 utility patents granted from June to August in 2022?", "external_knowledge": "patents_info.md"}
{"instance_id": "sf_bq212", "db": "PATENTS", "question": "For United States utility patents under the B2 classification granted between June and September of 2022, identify the most frequent 4-digit IPC code for each patent. Then, list the publication numbers and IPC4 codes of patents where this code appears 10 or more times.", "external_knowledge": "patents_info.md"}
{"instance_id": "sf_bq214", "db": "PATENTS_GOOGLE", "question": "For United States utility patents under the B2 classification granted between 2010 and 2014, find the one with the most forward citations within a month of its filing date, and identify the most similar patent from the same filing year, regardless of its type.", "external_knowledge": "patents_info.md"}
{"instance_id": "sf_bq216", "db": "PATENTS_GOOGLE", "question": "Identify the top five patents filed in the same year as `US-9741766-B2` that are most similar to it based on technological similarities. Please provide the publication numbers.", "external_knowledge": "patents_info.md"}
{"instance_id": "sf_bq247", "db": "PATENTS_GOOGLE", "question": "From the publications dataset, first identify the top six families with the most publications whose family_id is not '-1'. Then, using the abs_and_emb table (joined on publication_number), provide each of those families\u2019 IDs alongside every non-empty abstract associated with their publications.", "external_knowledge": null}
{"instance_id": "sf_bq127", "db": "PATENTS_GOOGLE", "question": "For each publication family whose earliest publication was first published in January 2015, please provide the earliest publication date, the distinct publication numbers, their country codes, the distinct CPC and IPC codes, distinct families (namely, the ids) that cite and are cited by this publication family. Please present all lists as comma-separated values, sorted alphabetically", "external_knowledge": null}
{"instance_id": "sf_bq215", "db": "PATENTS", "question": "Which US patent (with a B2 kind code and a grant date between 2015 and 2018) has the highest originality score calculated as 1 - (the sum of squared occurrences of distinct 4-digit IPC codes in its backward citations divided by the square of the total occurrences of these 4-digit IPC codes)?", "external_knowledge": "patents_info.md"}
Comment thread config_example.json
},
{
"key": "PIPELINE_TAG",
"description": "Run name suffix appended to the runs/ folder name (e.g. v2 \u2192 runs/mcp_v2/)",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants